AITopics | contrastive training

94da80cbfe870c1db958c88a8a27018c-Paper-Conference.pdf

Neural Information Processing SystemsJun-19-2026, 22:31:02 GMT

F ne oundation w capabilities models be trained yond their at sca initial le exhibit training remarkable objectiv emer es. W gent e find beha such viors, emer learning gent behaviors in biological vision models via large-scale contrastive vision-language training. To achieve this, we first curate TREEOFLIFE-200M, comprising 214 million ism image images dataset of li to ving date.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Genre: Research Report > Experimental Study (1.00)

Industry:

Government (0.67)
Information Technology > Services (0.46)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Communications (1.00)
(5 more...)

Add feedback

63d4316315900a62e610e5c17bab900a-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 19:06:18 GMT

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Basel-City > Basel (0.05)
North America > Canada > Ontario > Toronto (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Poland (0.04)

Genre: Research Report (0.49)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
(2 more...)

Add feedback

Focused Transformer: Contrastive Training for Context Scaling

Neural Information Processing SystemsDec-26-2025, 06:51:16 GMT

Large language models have an exceptional capability to incorporate new information in a contextual manner. However, the full potential of such an approach is often restrained due to a limitation in the effective context length. One solution to this issue is to endow an attention layer with access to an additional context, which comprises of (key, value) pairs. Yet, as the number of documents increases, the proportion of relevant keys to irrelevant ones decreases, leading the model to focus more on the irrelevant keys. We identify a significant challenge, dubbed the distraction issue, where keys linked to different semantic values might overlap, making them hard to distinguish.

contrastive training, focused transformer, name change, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.60)
Information Technology > Artificial Intelligence > Machine Learning (0.37)

Add feedback

Your Diffusion Model is Secretly a Noise Classifier and Benefits from Contrastive Training

Neural Information Processing SystemsDec-25-2025, 00:37:52 GMT

Diffusion models learn to denoise data and the trained denoiser is then used to generate new samples from the data distribution. In this paper, we revisit the diffusion sampling process and identify a fundamental cause of sample quality degradation: the denoiser is poorly estimated in regions that are far Outside Of the training Distribution (OOD), and the sampling process inevitably evaluates in these OOD regions.This can become problematic for all sampling methods, especially when we move to parallel sampling which requires us to initialize and update the entire sample trajectory of dynamics in parallel, leading to many OOD evaluations. To address this problem, we introduce a new self-supervised training objective that differentiates the levels of noise added to a sample, leading to improved OOD denoising performance. The approach is based on our observation that diffusion models implicitly define a log-likelihood ratio that distinguishes distributions with different amounts of noise, and this expression depends on denoiser performance outside the standard training distribution.We show by diverse experiments that the proposed contrastive diffusion training is effective for both sequential and parallel settings, and it improves the performance and speed of parallel samplers significantly.

artificial intelligence, machine learning, noise classifier and benefit, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Contrastive Training of Complex-Valued Autoencoders for Object Discovery

Neural Information Processing SystemsDec-24-2025, 05:51:38 GMT

Current state-of-the-art object-centric models use slots and attention-based routing for binding. However, this class of models has several conceptual limitations: the number of slots is hardwired; all slots have equal capacity; training has high computational cost; there are no object-level relational factors within slots. Synchrony-based models in principle can address these limitations by using complex-valued activations which store binding information in their phase components. However, working examples of such synchrony-based models have been developed only very recently, and are still limited to toy grayscale datasets and simultaneous storage of less than three objects in practice. Here we introduce architectural modifications and a novel contrastive learning method that greatly improve the state-of-the-art synchrony-based model. For the first time, we obtain a class of synchrony-based models capable of discovering objects in an unsupervised manner in multi-object color datasets and simultaneously representing more than three objects.

complex-valued autoencoder, contrastive training, name change, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning

Gu, Jianyang, Stevens, Samuel, Campolongo, Elizabeth G, Thompson, Matthew J, Zhang, Net, Wu, Jiaman, Kopanev, Andrei, Mai, Zheda, White, Alexander E., Balhoff, James, Dahdul, Wasila, Rubenstein, Daniel, Lapp, Hilmar, Berger-Wolf, Tanya, Chao, Wei-Lun, Su, Yu

arXiv.org Artificial IntelligenceOct-24-2025

Foundation models trained at scale exhibit remarkable emergent behaviors, learning new capabilities beyond their initial training objectives. We find such emergent behaviors in biological vision models via large-scale contrastive vision-language training. To achieve this, we first curate TreeOfLife-200M, comprising 214 million images of living organisms, the largest and most diverse biological organism image dataset to date. We then train BioCLIP 2 on TreeOfLife-200M to distinguish different species. Despite the narrow training objective, BioCLIP 2 yields extraordinary accuracy when applied to various biological visual tasks such as habitat classification and trait prediction. We identify emergent properties in the learned embedding space of BioCLIP 2. At the inter-species level, the embedding distribution of different species aligns closely with functional and ecological meanings (e.g., beak sizes and habitats). At the intra-species level, instead of being diminished, the intra-species variations (e.g., life stages and sexes) are preserved and better separated in subspaces orthogonal to inter-species distinctions. We provide formal proof and analyses to explain why hierarchical supervision and contrastive objectives encourage these emergent properties. Crucially, our results reveal that these properties become increasingly significant with larger-scale training data, leading to a biologically meaningful embedding space.

artificial intelligence, io clip 2, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2505.23883

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.48)

Industry:

Information Technology (0.93)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Three Towers: Flexible Contrastive Learning with Pretrained Image Models

Neural Information Processing SystemsOct-8-2025, 19:33:10 GMT

LiT directly replaces the image tower with the frozen embeddings, excluding any potential benefits from training the image tower contrastively.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Basel-City > Basel (0.05)
North America > Canada > Ontario > Toronto (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Poland (0.04)

Genre: Research Report (0.49)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
(2 more...)

Add feedback

FLeW: Facet-Level and Adaptive Weighted Representation Learning of Scientific Documents

Dou, Zheng, Wang, Deqing, Zhuang, Fuzhen, Ren, Jian, Hu, Yanlin

arXiv.org Artificial IntelligenceSep-10-2025

Scientific document representation learning provides powerful embeddings for various tasks, while current methods face challenges across three approaches. 1) Contrastive training with citation-structural signals underutilizes citation information and still generates single-vector representations. 2) Fine-grained representation learning, which generates multiple vectors at the sentence or aspect level, requires costly integration and lacks domain generalization. 3) Task-aware learning depends on manually predefined task categorization, overlooking nuanced task distinctions and requiring extra training data for task-specific modules. To address these problems, we propose a new method that unifies the three approaches for better representations, namely FLeW. Specifically, we introduce a novel triplet sampling method that leverages citation intent and frequency to enhance citation-structural signals for training. Citation intents (background, method, result), aligned with the general structure of scientific writing, facilitate a domain-generalized facet partition for fine-grained representation learning. Then, we adopt a simple weight search to adaptively integrate three facet-level embeddings into a task-specific document embedding without task-aware fine-tuning. Experiments show the applicability and robustness of FLeW across multiple scientific tasks and fields, compared to prior models.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.07531

Country: Asia > China (0.15)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

d27b95cac4c27feb850aaa4070cc4675-Paper.pdf

Neural Information Processing SystemsAug-16-2025, 14:28:24 GMT

computer vision, proceedings, proposal, (12 more...)

Neural Information Processing Systems

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.97)

Add feedback

Your Diffusion Model is Secretly a Noise Classifier and Benefits from Contrastive Training

Neural Information Processing SystemsMay-26-2025, 21:32:22 GMT

Diffusion models learn to denoise data and the trained denoiser is then used to generate new samples from the data distribution. In this paper, we revisit the diffusion sampling process and identify a fundamental cause of sample quality degradation: the denoiser is poorly estimated in regions that are far Outside Of the training Distribution (OOD), and the sampling process inevitably evaluates in these OOD regions.This can become problematic for all sampling methods, especially when we move to parallel sampling which requires us to initialize and update the entire sample trajectory of dynamics in parallel, leading to many OOD evaluations. To address this problem, we introduce a new self-supervised training objective that differentiates the levels of noise added to a sample, leading to improved OOD denoising performance. The approach is based on our observation that diffusion models implicitly define a log-likelihood ratio that distinguishes distributions with different amounts of noise, and this expression depends on denoiser performance outside the standard training distribution.We show by diverse experiments that the proposed contrastive diffusion training is effective for both sequential and parallel settings, and it improves the performance and speed of parallel samplers significantly.

artificial intelligence, machine learning, noise classifier and benefit, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Filters

Collaborating Authors

contrastive training

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

94da80cbfe870c1db958c88a8a27018c-Paper-Conference.pdf

63d4316315900a62e610e5c17bab900a-Paper-Conference.pdf

Focused Transformer: Contrastive Training for Context Scaling

Your Diffusion Model is Secretly a Noise Classifier and Benefits from Contrastive Training

Contrastive Training of Complex-Valued Autoencoders for Object Discovery

BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning

Three Towers: Flexible Contrastive Learning with Pretrained Image Models

FLeW: Facet-Level and Adaptive Weighted Representation Learning of Scientific Documents

d27b95cac4c27feb850aaa4070cc4675-Paper.pdf

Your Diffusion Model is Secretly a Noise Classifier and Benefits from Contrastive Training